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By  definition,  the  subjective  probability  distribution  of  a  randan  event 
is  revealed  by  the  ("rational")  subject's  choice  between  bets — a  view  stressed 
by  F.  Ramsey,  B.  De  Finetti,  L.J.  Savage  and  traceable  to  E.  Borel  and,  it  can 
be  argued,  to  T.  Bayes.  Since  hypotheses  are  not  observable  events,  no  bet  can 
be  made,  and  paid  off,  on  a  hypothesis.  The  subjective  probability  distribution 
of  hypotheses  (or  of  a  parameter,  as  in  the  current  "Bayesian”  statistical 
literature)  is  therefore  a  figure  of  speech,  an  "as  if,"  justifiable  in  the 
limit.  Given  a  long  sequence  of  previous  observations,  the  subjective  pos¬ 
terior  probabilities  of  events  still  to  be  observed  are  derived  by  using  a 
mathematical  expression  that  would  approximate  the  subjective  probability 
distribution  of  hypotheses,  if  these  could  be  bet  on.  This  position  was 
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sessions  of  the  Interdisciplinary  Colloquium  on  Mathematics  in 
Behavioral  Science,  University  of  California  at  Los  Angeles 
(J.  Marschak,  chairman) . ^  A  paper  by  Morris  H.  DeGroot,  on 
Stopping  Rules,  elicited  comments  on  the  following  question. 

Suppose  all  probabilities  are  defined  as  "subjective",  "personal"- 
i.e.,  as  being  revealed  by  a  "rational",  "consistent"  decision¬ 
maker's  choices  under  uncertainty:  if  he  chooses  to  bet  on  one 
rather  than  on  another  event,  the  former  event  is  said  to  be  the 
subjectively  more  probable  one.  Moreover,  a  few  rather  plausible 
quasi-logical  postulates  of  "rationality"  of  choices  imply  that 
such  "subjective  probabilities"  have  indeed  the  properties  of  a 
mathematical  "probability  measure."—  Our  question  is:  what 
meaning,  if  any,  can  be  assigned  to  the  probability  of  a  hypo¬ 
thesis,  law,  theory  that  is  itself  a  probability  distribution  so 
that  its  falsity  or  truth  is  not,  in  general,  an  observable  event 
on  which  bets  can  be  made  and  paid  off*  If  an  urn  is  sealed, 
bets  can  be  taken,  both  before  and  after  some  drawings  were  made, 
on  what  the  outcome  of  subsequent  drawings  will  be.  For  these 
outcomes  will  be  observed  and  the  bets  paid  off.  But  no  bets 
can  be  paid  off  on  the  content  of  the  urn  itself  unless  it  is 
unsealed.  Most  laws,  theories,  hypotheses  are  urns  sealed 
forever.  Statisticians  who  speak  of  the  prior  and  posterior 
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distribution  of  a  statistical  parameter  use,  in  fact,  a  figure 
of  speech  which  deserves  clarification.  J.  Marschak’s  comments 
were  therefore  circulated  to  a  number  of  workers  in  this  field. 
Several  have  answered.  A  "Round  Robin"  (the  late  L.J.  Savage’s 
favorite  term)  resulted,  its  publication  was  permitted  by  all 
contributors.  They  are  (alphabetically) 

Karl  Borch  Koichi  Miyasawa 

Herman  Chernoff  Paul  Randolph 

Morris  H.  DeGroot  L.J.  Savage 

Robert  Dorfman  Robert  Schlaifer 

Ward  Edwards  Robert  L.  Winkler 

T.S.  Ferguson 

Students  of  the  fundamentals  of  decision  theory  and  sta¬ 
tistical  inference  know,  and  it  is  partly  shown  in  the  attached 
Bibliography,  that  most  of  the  contributors  have  continued  to 
work  in  the  field. 

As  the  reader  will  see,  most  respondents  agreed  that,  to 
assign  probability  to  the  truth  of  a  probabilistic  theory  is  to 
use  a  figure  of  speech,  a  "as  if",  that  is  justified  in  the  limit . 
For,  given  a  sufficiently  long  sequence  of  observed  events, 
one  can  compute  approximate  subjective  probabilities  of  events 
still  to  be  observed,  by  using  as  an  intermediate  step  a  mathe¬ 
matical  expression  which  could  be  called  the  subjective  pro¬ 
bability  distribution  of  hypotheses  if  it  were  possible  to  bet 
on  a  hypothesis.  A  few  of  the  respondents,  however,  denied  the 
existence  of  the  problem  itself,  by  permitting  to  associate  the 
term  "subjective  probability"  with  a  subject’s  naming  a  number, 
and  not  necessarily  with  his  choosing  between  decisions. 


: 


3. 


I 


ii 


f- 


* 


The  topic  of  the  present  discussion  is  far  from  outdated. 
Bruno  DeFinetti  has  treated  rather  recently  (1971)  the 
"Probability  of  a  Theory  and  probabilities  of  Facts",  with  ex¬ 
tensive  reference  to  an  instructive  example  provided  by  i  .j. 

Good  (1969)  . 

I  may  be  permitted  to  add  two  remarks  of  my  own.  One 

concerns  a  question  of  history  and  of  definition.  It  was  raised 

by  L.J.  Savage  in  response  to  Item  7.  of  my  comments.  Did 

Thomas  Bayes  (1763)  interpret  probabilities  as  subjective  ones, 

revealed  by  the  bets  of  a  rational  decision-maker?  Was  The 

Reverend  a  "personalist " ,  anticipating  Borel  (1924),  Ramsey 

(1924-28),  DeFinetti  (1937),  Savage  (1954)?  To-day’s  term 

"Bayesian  statistics"  seems  mostly  to  denote  the  use  of  prior  and 

the  derivation  of  posterior  probabilities  (by  "Bayes’  Theorem")? 

in  addition,  viewing  himself  as  a  dec  is  ion -maker;  the  "Bayesian 

statistician"  is  supposed  to  concern  himself  with  the  posterior 

expectation  of  the  "loss"  (the  negative  of  the  economists'  "gross 

payoff"  or  "benefit")  and  cost.  But,  to  be  truly  "Bayesian",  does 

he  not  have,  in  addition,  to  interpret  his  prior  and  posterior 

probabilities  as  subjective  ones?  It  is  not  very  important,  of 

course,  what  labels  we  use,  provided  we  agree  on  their  meaning. 

Yet  there  may  be  some  advantage  if,  in  addition,  we  agree  with 

2) 

history  when  the  label  is  a  historical  name.  '  Let  me  quote 
"Definition  5"  of  Bayess 

"The  probability  of  any  event  is  the  ratio  between  the 
value  at  which  an  expectation  depending  on  the  happening 
of  the  event  ought  to  be  computed  (J.M.'s  italics),  and 
the  value  of  the  thing  expected  upon  its  happening" . 
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Does  not  the  "ought  to"  indicate  a  norm  of  behavior,  stating 
what  we  would  now  call  "consistent",  "rational"  behavior?  To 
the  value  (u  dollars,  say)  that  you  would  gain  if  the  uncertain 
eve  it  happens,  there  "ought  to"  correspond  in  your  mind  a  smaller 
but  sure  value  (c  dollars)  such  that  you  are  indifferent  between 
gaining  u  upon  the  happening  of  the  uncertain  event  and  gaining 
c  with  certainty.  Hence  there  ought  to  be  in  your  mind  also  a 
ratio  of  these  two  numbers,  c/u  -  p  (say)  ,  which  Bayes  calls 
the  probability  of  that  uncertain  event.  This  argues,  I  think 
for  my  ascribing  him  the  personalistic  view.  Note  also  that  he 
calls  our  c  the  "value  at  which  an  expectation  depending  on  the 
event  ought  to  be  computed"  .  This  actually  agrees  with  the 
modern  use  of  ("mathematical")  expectations  for  Bayes  ob¬ 
viously  assumes  that  the  payoff  if  that  uncertain  event  does  not 
happen  is  =  0,  so  that  indeed,  if  p  *  c/u  then  up  +  0*(l-p)  =  c. 
This  interpretation  of  Bayes1  term  "expectation"  is  confirmed  for 
the  case  when  the  bettor's  loss  >  0,  in  Bayes'  "Proposition"  2s 
"if  a  person  has  an  expectation  depending  on  the  happening 
of  an  event,  the  probability  of  the  event  is  to  the  pro¬ 
bability  of  its  failure  as  his  loss  is  if  it  fails  to  his 
gain  if  it  happens"  . 

Here  "the  person  (who)  has  an  expectation"  considers  a  fair  bet. 
That  is,  denoting  his  gain  and  loss  (both  uncertain)  by  u  and  £, 
u  p  +  (-4)  (1-p)  =  0,  hence  p/(l-p)  =  4/u.  If  p  were  smaller  the 
consistent  person  would  not  accept  the  bet,  given  4  and  u. 

As  remarked  in  my  Item  7  this  presupposes,  in  terms  of 
modern  decision  theory,  that  utility  is  linear  in  the  dollar 
amounts  —  an  assumption  rejected  by  Bayes'  contemporary  Daniel 
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Bernoulli  end  his  Petersburg  paradox.  This  difficulty,  —  not 
present  in  the  more  general  approaches  of  Ramsey  and  Savage  — 
was  recognized  by  their  fellow-personalist  DeFinetti  in  the 
English-language  revision  (1964)  of  his  pioneering  "La 
provision..."  (1937).  It  is  perhaps  true  that  money  utility 
is  almost  linear  for  small  monetary  gains  and  losses,  and  per¬ 
sonal  probabilities  are  revealed  by  the  ratio  i/\x  in  a  bet  accepted 
by  a  consistent  (e.g.,  an  appropriately  experienced)  bettor: 
this  is  believed  by  both  DeFinetti  and  Savage  (1962)  .  Others  — 
e.g.,  Tversky  (1972)  —  believe"^  that,  typically,  small  monetary 
differences  fail  to  be  discriminated  by  the  subject.  This  re¬ 
sults  in  faulty  elicitation  of  the  subject's  probabilities  by  the 
observer . 

My  second  remark  refers  to  the  "objective"  probability  of 
my  Item  6.  I  would  now  prefer  the  word  "intersub^ective"  .  When 
a  person  considers  a  coin  to  be  "symmetrical",  or  a  sequence  of 
trials  of  an  experimenter  to  be  "repeated  and  independent"? 
when,  in  short,  a  person  regards  those  random  events  as 
"exchangeable"  in  DeFinetti’ s  sense,  this  is  revealed  by  the 
(rational)  person’s  choice:  if  two  events  are  exchangeable  he  is 
indifferent  on  which  of  them  to  bet.  If  two  or  more  persons,  all 
rational,  agree  that  those  trials  are  exchangeable  they  will  agree 
(by  definition)  that  certain  prior  probabilities  are  equal  al¬ 
though  they  may  disagree  about  their  size.  Moreover:  even  in 
the  presence  of  this  latter,  prior,  disagreement  (provided  only 
that  they  agree  which  events  have  non-zero  probability)  ,  their 
agreement  about  exchangeability  of  successive  trials  will  entail, 
when  these  trials  are  sufficiently  numerous,  an  almost  complete 
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agreement  on  the  size  of  the  subjective  posterior  probability 
of  each  particular  outcome  of  the  experiment?  (and  also  on  the 
"as  if"  subjective  probabilities  of  hypotheses,  as  mentioned 
above)  .  For  this  posterior  probability  will  be  approached  by 
that  outcome's  observed  (hence  "objective")  relative  frequency. 
To  put  it  differently?  each  of  the  rational  subjects  whose 
prior  choices  reveal  agreement  that  certain  trials  are  exchange¬ 
able,  will  come  closest  to  choosing  the  same  bet  a_  posteriori  as 
the  best,  if  each  of  them  computes  expected  utility  on  the  basis 
of  observed  relative  frequencies  (which  are  the  same  for  all 
subjects)  used  as  posterior  probabilities. 

Jacob  Marschak 


May  1974 


University  of  California, 


Los  Angeles 
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Stogginj^ules 

(Outline) 

Consider  an  experimenter  who  can  take  independent  observations 

X  ,X_ ,  * .  * ,  sequentially  iron  a  population  whose  distribution  involves 

some  unknown  parameters*  After  each  observation  X^  the  experimenter 

can  either  stop  sampling  and  receive  a  specified  reward  r(X^,***,Xn) 

that  depends  on  the  values  X^...,  Xq  that  he  has  observed,  or  pay  a 

specified  price  (typically  a  constant  cost  per  observation)  and  observe 

X  , .  It  may  also  be  true  that  at  some  stage  (possibly  random)  the 
n+1 

experimenter  is  forced  to  stop  sampling  and  accept  his  reward*  The 
total  gain  of  the  experimenter  when  he  stops  is  the  reward  r(X^,...X2) 
that  he  receives  minus  the  amount  spent  on  sampling.  The  problem  i?  to 
find  a  stopping  rule  that  maximizes  his  expected  total  gain* 

Distributiona^Assumgtion^anj^lotation 

In  the  problems  to  be  discussed,  it  is  assumed  that  the  observations 

X.  ,X—  |  *  r  *  are  normally  distributed  with  unknown  mean  0  and  variance  1. 

1  2 

The  parameter  6  is  assumed  to  have  a  normal  prior  distribution  with 

2 

mean  p,  and  variance  o  <  Symbolically, 

0  ~  N(p.,02)  . 


r. 


#'-rr- ■'•'■  •• "  ^ '  -  ■  -,•••(- 


The  marginal  distribution  of  an  observation  X  is  then  N(m-,  1  +  rr)  and 
the  posterior  distribution  of  8  given  X  »  x  is 


N  / j  *  A  ,  n2  \ 

aw  i*„v- 


The  symbols  cp  and  $  denote  the  density  function  and  the  distri¬ 
bution  function  of  the  N(Ofl)  distribution.  The  following  function  t 
occurs  throughout  the  discussion: 

t(t)  *  J*  (x-t)d$(t)  ■  cp(t>  -  t[l-$(t)]  . 


It  has  the  property  that 


K-t)  ■  tji(t)  +  t 


and  its  derivative  is  given  by 


r<t)  »  . 


The  inverse  function  ty  is  also  used.  For  any  numbers  x  and  y ,  write 


x  y  y  *  (the  maximum  of  x  and  y}  . 


Some  Specific  Problems 


1.  Sampling  without  recall.  In  this  problem  the  experimenter's  record 
when  he  stops  is  X the  value  of  the  last  observation  that  he  has  taken. 
There  is  a  fixed  cost  c  per  observation.  The  experimenter's  total  gain 
when  he  stops  is  X„  -  no.  His  position  at  any  stage  of  the  sampling  pro- 

U 

2 

cess  is  described  by  the  triple  (r,|j,,a  )  where  r  is  the  reward  that  he 

2 

will  receive  if  he  stops  without  further  sampling  and  (p> tcr  )  are  the 
parameters  of  the  current  posterior  distribution  of  3  , 


2 

Let  V(r,n,ff  )  denote  the  value  to  the  experimenter  of  being  In 
2 

position  (r,ii,«<7  ).  It  is  the  expected  total  gain  following  the  optimal 

2 

procedure  from  position  (r,(i,a  )« 

2,  Forced  stopping.  In  this  problem  there  is  no  cost  of  sampling. 

However,  at  any  stage  of  the  prooess,  if  the  experimenter  deoldes  to 

continue  sampling,  there  is  a  fixed  probability  p(0  <  p  <  1)  that  he 

will  be  forced  to  stop  after  the  next  observation  and  accept  its  value 

as  bis  reward.  His  total  gain  when  he  stops  is  X  •  His  position 

n 

2  2 
(r,n,rr  )  at  any  stage  and  its  value  V(r,p,,a  )  are  as  defined  above* 

3.  Sampling  with  recall.  If  the  experimenter  stops  sampling  after  having 
observed  X.  ,...,  X  his  reward  is  X.  V  X_  ...  V  X  .  the  largest  of 

XU  1  a  Q 

the  observations  that  he  has  taken.  There  Is  a  fixed  cost  c  per 
observation.  The  experimenter’s  total  gain  when  he  stops  is 


X.  V  ...  V  X„  -  nc  ,  ‘ 

1  U 

2  2 
His  position  (r,M>,a  )  at  any  stage  and  Its  value  V(r,p.,a  )  are  as 

defined  above. 


4,  Variations  on  the  above.  Further  problems  involve  (i)  discounted  obser¬ 
vations,  (11)  guaranteed  minimum  rewards,  (ill)  a  choice  of  populations 
from  whlsh  to  sample  at  each  stage,  (lv)  a  reward  function  of  the  form 


X  V  X  ,  V 
n  n-1 


V  x 


n-k 
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J.  MARSCHAK 


1.  The  operational  meaning  of  personal  probabilities  is  easily  estab¬ 
lished  when  they  are  defined  on  the  set  of  events  that  will  influence 
the  consequences  of  an  ideally  "consistent"  ("rational")  person's  actions. 
That  is,  the  personal  probability  P(Z)  of  event  Z  for  consistent 
Mr.  Smith  can  be  revealed  (or  at  least  approximated,  as  I  shall  explain 
presently)  by  his  choices:  not  just  by  the  purely  linguistic  exercise  of 
asking  him  to  name  a  number,  as  has  been  done  by  some  contemporary  experi¬ 
menters,  with  results  that  can  be  used  to  predict  future  words,  not  future 
actions. 


2.  However,  "Bayesian  statisticians"  use,  in  addition,  personal  ("prior") 
probabilities  defined,  net  on  the  set  of  events,  but  on  the  set  of  prob¬ 
ability  distributions  over  this  set.  For  example,  DeOroot's  p  and  o 
characterize  a  (normal)  prior  distribution  of  the  parameter  9  which,  in 
turn,  characterizes  the  distribution  of  the  random  sequence  (x)  of  ident¬ 
ically  distributed  variables,  whose  values  do  of feet  the  decision-maker's 
payoff . 

3.  The  operational  meaning  of  a  prior  distribution  of  distributions, 
in  terms  of  the  decision-maker's  choices,  is  not  obvious.  Yet,  after 
recapitulating  the  meaning  of  personal  (and  also  of  objective)  prob¬ 
abilities  of  events,  I  shall  try  to  extend  it  to  the  prior  distributions 
of  distributions,  much  inspired  by  a  brief  exchange  of  opinions  with 
our  speaker  today. 


4.  Let  X  be  the  set  of  states  x  of  Nature  (not  controlled  by  the 

decision-maker) .  When  x  Is  in  the  subset  Z  of  X  we  say  that  event 

Z  has  happened.  Thus  a  probability  measure  P  on  X  will  determine  the 
probabilities  P(Z) ,  P(Z'),  ...  of  the  events  Z,Z',...  Wien  a  person 
takes  an  action  whose  consequence  if  Z  happens ,  is  preferable  to  its 
consequence  if  Z  does  not  happen ,  we  say  that  he  bets  on  Z  . 

5.  Now,  a  probability  measure  P  on  X  is  called  personal  with  respect 

to  Mr.  Smith  if,  given  any  two  events  Z,Z',  he  prefers  to  bet  on  Z 
rather  than  on  Z'  whenever  P(Z)  >  P(Z'),  and  is  indifferent  when  over 
P(Z)  =  P(Z').  [Note  again:  personal  probability  is  revealed  by  what  the 
person  does,  not  by  his  stating  verbally  a  number!]  It  has  been  shown  by 
F.  Ramsey  (1926-28),  B.  De  Finetti  (1937),  L.J.  Savage  (1954)  that,  for  a 
person  obeying  the  rules  of  logic  supplemented  by  a  few  plausible  consis¬ 
tency  postulates,  personal  probabilities  of  events,  in  the  sense  just  stated, 
do  exist,  along  with  a  numerical  utility  function  on  the  set  of  consequences 
of  his  actions.  Utility  is  defined  as  a  variable,  whose  expectations  (com¬ 
puted  on  the  basis  of  personal  probabilities)  the  person's  chosen  action 
will  maximize  over  the  set  of  all  available  actions. 

6.  Moreover,  in  the  case  of  certain  ideally  symmetrical,  Interchangeable 
events  (ideal  coins,  ideally  repeatable  samples)  rules  of  logic  will  make 

the  personal  probability  of  an  event  the  same  for  all  consistent  persons,  thus 
making  this  probability  "objective". 

7.  T.  Bayes  himself  (1963),  followed  by  De  Finetti  [and  also  by  R.  Carnap 
(1962)  in  a  paper  presented  in  this  Colloquium,  March  10,  1961]  considered 
personal  probabilities  as  revealed  by  choices  between  bets  involving 
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different  monetary  odds.  Thus  P(Z)  -  p  for  a  consistent  Mr.  Smith,  if 
he  accepts  any  bet  on  Z  in  which  the  ratio  of  his  money  losses  to  his 
money  gains  is  smaller  than  p/(l-p)  .  But  this  presupposes,  too  narrowly, 
a  linear  utility  function  of  money.  Let  me  state  a  more  general  approach. 
First  establish  that  Smith  is  indifferent  between  betting  on  a  clock's 
twirled  hand's  stopping  within  any  two  equal  arcs  of  the  circumferance . 

This  will  reveal  that  he  assigns  certain  ideal  symmetry  properties  to 
the  physical  mechanism  used.  Now  following  the  spirit  of  a  suggestion 
by  E.  Borel  (1924):  if  Smith  prefers  betting  on  event  Z  ("rain  tomorrow") 
to  betting  on  the  hand's  stopping  within  a  30°  arc  but  not  to  betting  on 
its  stopping  within  a  60°  arc,  then  for  him 

1/12  <_  P(Z)  £  1/6  , 

—  and  so  on  in  an  obvious  succession  of  steps.  This  "Borelian"  procedure  5) 
is  analogous  to  that  of  the  ear-doctor's  assessing  your  hearing  capacity, 
or  the  analytical  chemist's  titration  (and  is  subject  to  the  same  limitations 
except  as  an  ideal)  . 

8.  Consider  now  n  urns  in  which  the  proportions,  p  ,  of  red  balls  are 
equal,  respectively,  to  p^f-.,PQ>  Suppose  you  know  that  one  and  only 
one  of  the  n  urns  is  being  used  in  a  sequence  of  drawings.  Let  f(pA)  = 
be  the  personal  probability  you  assiiyn  to  the  event  ("hypothesis")  that 
the  urn  used  is  the  i-th  urn.  The  function  f(')  is  your  prior  probability 
of  the  parameter  p  viewed  as  a  n-valued  random  variable.  V/e  have 
n 


(1>  i=l  fi’sl  :  fi  5  f<Pi>  >  °»  1  n- 
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You  will  never  find  which  o 1  the  n  events  actually  takes  place,  i.e. 
which  urn  is  being  used.  Therefore  the  numbers  f^  cannot  be  established 
c’.irectly  by  asking  you  to  choose  between  bets  on  the  i-th  event  and  on 
the  arcs  of  our  dial. 

9.  But  here  is  an  indirect  method.  V/e  shall  draw  balls,  with 

replacement.  Before  the  first  drawing,  1  use  on  you  the  Borelian 

procedure  to  establish  your  personal  probability  that  the  ball  will  be 

red;  call  this  number  r^ .  Then  clearly  the  numbers  are  related  to 

known  numbers  r,  . . .  by  the  relation 

11  n 

(2>  ri  "  Jl  fipi  ’ 

subject  to  the  constraints  (1).  Let  x  a  1  or  0  according  as  the  t-th 

X 

ball  drawn  is  or  is  not  red.  Applying  the  Bayes  theorem  (an  identity 
implied  by  the  definition  of  conditional  probabilities),  the  n  posterior 
probabilities,  given  the  result  of  the  first  drawing,  are  «ing 

here  and  henceforth  over  i) 


:iP/C  fiPi 


if  Xj*l 


fi(1"pi)/2:  fi(1"pi)  i£  xi  a  0 


ic  1, .  . .  ,n 


Your  personal  probability,  after  the  first  drawing,  that  the  second 

drawing  will  be  red  (x.el)  is  therefore  equal  to 

2 


T2~  r2(1)  *  £  fiPi/i:fi9i 


if  x  -1 


r2=  r2(0*  “  £  fiPi<1“Pi)/£fi<1’Pi)  if  *1  B  °* 


I  know  x  ;  and  I  establish  your  r_  by  the  Borelian  procedure;  wc 

have  now  added  one  more  equation  to  (1)  and  (2),  with  the  f^  as  unknown 

and  r.,r  ,p  ,...,p  known.  In  general,  writing 
i  &  l  n 

y  =  x  +x  +  ...+X  ,  tal  ,2  , . . . 

t  12  t 


yo  -  °» 
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your  personal  probability  (ascertainable  by  the  Borelian  procedure) , 


rt+1<  that  the  (tvl)-th  ball  will  be  red  is 


(3) 


rt+l“  WV 


y  >1  t-y 

1  Vi  <1-Pi> 


y.  t-y 

z  Vi  (l-p^ 


I  t-0,1, 


If  t+l=  n-1,  i.e.  if  the  Borelian  procedure  was  applied  n-1  times 
(l.e.,  n-2  drawings  were  made)  we  have  as  many  equations  as  we  need  to 


determine  f^,...,f  ,  subject  to  (1).  The  numbers  r  elicited  from 


the  consistent  man  must  also  obey  the  inequality  in  (1).  Moreover,  his 


values  for  r_,r_i,,...  must  be  such  that  the  system  of  equations  and 


n  n+1 


the  inequality  are  satisfied  by  some  values  V‘*,,fn’  Thus  if  n=2  and 


p  ji  p  then  (without  loss  of  generality)  p  >  r  >  p  and 

1  «  11a 


V  <rrp2)/<prp2) 


*2-  <Pl-ri>''<Pl'P2)  ■ 


and  hence,  for  n  =2,  all  the  subsequent  values  r  ,r  , 

£>  3 


are  fixed  without 


ever  making  a  drawing  -  provided  you  are "consistent" . 


However,  let  p  range  continuously  from  0  to  1.  Then  the  vector 
[ f ^  ]  is  replaced  by  a  density  function  f(p);  or,  more  generally,  we  try 
to  find  the  distribution  function  F(p)  from  the  functional  equations 


(3’> 


fl  y  +1  t-y 

]  P  (1-P)  dF(p) 
0 


t+1 


,  t=0,l,...,T 


r  *t  t-yt 
J  p  *(l-p)  T  dF(p)  , 


.--.a- .  ...«•■ 


?»«t  vfn  w-i 


<#>»***»*» 


Can  we  conjecture  that,  with  T  finite,  F  cannot  be  ascertained  from 
the  sequence  rj» * • • exact*y?  And  that  an  approximate  solution 
will,  in  some  sense,  converge  to  F  as  T  increases? 

10.  Now  return  to  the  case  of  a  finite  number  n  of  hypotheses  (l.e. 
of  values  of  the  parameter  p) .  We  can  modify  our  procedure  so  that 
no  actual  drawings  need  take  place,  even  when  n  >  2.  I  can  use  the 
Borelian  dial  to  establish  the  personal  probability  q(t,y  )  which  you 
assign  to  the  following  event:  in  a  sequence  of  t  drawings  without 
replacement,  y  balls  will  be  red.  Then 

v 

»  y.  t-y.  ,,  s 

C'J  <i<t,yt>  -  lE1f1P1  (i-Pl)  M  ; 

w 

here  the  Integers  t  and  y  can  vary  arbitrarily,  independently  of 

t 

n,  provided  t  >  1  and  0  <  y  <  t  .  We  can  therefore  produce  as  many 
“*  *  r  “ 

as  n  equations  of  the  form  (4),  all  linearly  independent  in  the  f^. 

If  you  are  consistent,  the  f  will  satisfy  the  constraints  (1). 
Moreover  any  additional  equation  of  the  form  (4)  elicited  by  another 
application  of  the  Borel  dial  should  be  satisfied  by  the  same  values 
of  the  f^. 

11.  Again,  similarly  to  the  case  of  (3*),  no  finite  number  of  such 
applications  will  suffice  to  determine  the  distribution  function  F(p) 
when  p  is  continuous  and  (4)  becomes 

(4')  q(t,yt)-  yt!(t-yt)!  o  f  1  p  t(l-p)  **  dF(p)  , 

- tj  ■  J  0 

with  a  known  number  on  the  left  side  and  a  sort  of  weighted  Beta-function 
(with  unknown  weights) on  the  right  side.  Again:  will  successive 
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approximations  converge  to  F  as  the  number  of  pairs  (t,y^)  increases? 

12.  All  this  was  intended,  not  to  suggest  psychological  experiments 
out  merely  to  show  that  operational  meaning  can  be  attached  to  prior 
probabilities  of  statistical  parameters.  Psychological  experiments  -- 
observing  q(t,yt'  in  (4)  —  will  most  probably  yield  the  result  that  no 
mortal  is  consistent.  Except,  of  course,  the  ideal  statistician. 

13.  In  a  subsequent  discussion,  Professor  Glenn  Graves  raised  the 
following  subtle  question.  The  original  consistency  postulates  have  been 
H .! f i neci  for  events  on  which  direct  bets  were  possible.  The  prior  and 
posterior  probabilities  of  the  Bayes  theorem  had  to  be  understood  accord¬ 
ingly.  On  the  other  hand  in  stating  that  the  consistent  man's  responses 
rt,yt  must  satisfy  certain  constraints  (as  in  the  paragraphs  following 
equation  (3)  and  equation  (4),  respectively),  Bayes  theorem  was  applied 
to  "events"  (viz.,  sets  of  values  of  the  parameter  p)  on  which  only 
indirect  bets  can  be  taken,  in  the  sense  described  above.  But:  do  those 
constraints  follow  from  the  original  postulates,  or  must  the  latter  be 
replaced  by  stronger  ones? 

A  Circular  Letter 

I  would  appreciate  it  if  you  would  give  your  opinion  on  the  question 
raised  in  the  enclosed  mimeographed  note  of  mine.  It  was  written  in 
connection  with  a  talk  delivered  by  Morris  DeGroot  in  which  he  used  the 
Bayesian  approach.  The  note  is  only  loosely  related  to  the  special  topic 
of  that  talk  (Stopping  Rules) .  It  deals  with  a  general  difficulty  which 
has  bothered  me  and,  in  the  last  paragraph,  with  a  further  logical  ques¬ 
tion  raised  by  G.  Graves. 

If  you  would  care  to  comment,  kindly  state  in  the  body  of  your  letter 
or  on  the  enclosed  paper,  whether  you  would  permit  me  to  circulate  your 
comments  to  these  same  people. 

9  May  1966  University  of  California, 

Lor  Angeles 
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KARLBORCH 

The  mathematics  of  your  intriguing  note  about  personal  probabilities 
is  obviously  related  to  what  the  statisticians  used  to  call  "The  Problem 
of  Moments"  seme  30  years  ago. 

Your  equations  ( 3  * )  make  it  possible  to  determine  the  T  +  1  first 
moments  of  the  distribution  F(p)  ~  or  if  you  like,  the  T  +  1  first  coef¬ 
ficients  in  the  power  series  of  the  characteristic  function.  The  other 
coefficients  can  be  chosen  arbitrarily.  Hence  there  will  be  an  infinity 
of  distributions  F(p)  compatible  with  a  consistent  sequence  r^  ...  rT+.. 

The  convergence  should  follow  from  the  fact  that  a  distribution  is 
uniquely  determined  by  its  moments  —  or  if  you  prefer  —  by  its  char¬ 
acteristic  function. 

These  superficial  remarks  do  of  course  gloss  over  a  number  of  diffi¬ 
culties.  The  moments  determined  by  (31)  may,  for  instance,  give  a  dis¬ 
tribution  defined  over  a  greater  domain  them  (0,1).  I  \  not  certain  how 
this  should  be  interpreted. 

You  should,  of  course,  feel  free  to  circulate  this  letter  if  you 
think  it  is  useful. 

6  June  1966  The  Norwegiem  School  of  Economics 

and  Business  Administration,  Bergen 

HERMMgWF 

The  question  you  raised  in  the  colloquium  comes  up  also  in  problems 
of  Empirical  Bayes  or  Compound  Decision  Theory.  There  is  a  basic  Identi- 
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fiability  problem  underlying  it.  In  the  binomial  case  one  can  estimate 
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F(p)  by  knowing  its  moments.  Thus  in  equation  (4'),  the  sequence 
q(t,t)  -  /q  p*  dF(p)  t  *=  1,2,... 

determines  F,  and  a  finite  subsequence  can  be  used  to  approximate  F. 

In  Compound  Decision  problems  it  is  typical  that  one  is  mainly 
interested  in  a  particular  function  of  F,  e.g.,  /pdF(p) .  Suppose  as  a 
typical  problem,  that  n  coins  arrive  each  with  a  possibly  different  value 
for  the  probability  of  head,  p,  (presumed  to  be  independently  selected 
from  an  unknown  distribution  F(p)).  Suppose  each  coin  is  tossed  once. 

Then  the  proportion  of  heads  observed  is  an  estimate  of  /pdF(p). 

2 

If  the  appropriate  decisions  depended  on  /p  dF(p),  the  experimental 
setup  described  above  seems  inadequate  to  get  a  "good"  estimate  of  the 
desired  quantity. 

Another  example  of  lack  of  identifiability  which  is  near  and  dear 
to  me  stems  from  a  problem  in  scoring  multiple  choice  questionnaires. 
Suppose  a  question  can  be  answered  True  or  False.  Assume  that  each 
student  answers  the  question  correctly  if  he  knows  the  answer  and  guesses 
at  random  otherwise.  If  50%  of  the  students  answer  incorrectly,  it  1> 
evident  that  almost  no  one  knew  the  answer  and  an  appropriate  procedure 
would  be  to  mark  everyone  wrong  even  if  they  achieved  the  correct  answer. 
The  appropriate  way  of  handling  the  students  depends  on  your  estimate  of 
X ,  the  proportion  of  the  students  who  know  the  correct  answer;  and  X  may 
be  estimated  in  terms  of  the  directly  observable  proportion  of  students 
who  answer  correctly.  A  more  sophisticated  version  gives  lack  of  identi¬ 
fiability.  Suppose  that  there  are  three  choices  A,  B,  C  of  which  A  is 
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correct.  The  model  assumes  that  students  are  of  four  types.  Those  who 
know  the  answer  and  answer  correctly  (proportion  A^) .  Those  who  know 
the  answer  is  A  or  B  and  guess  with  probability  1/2,  1/2  between  them 
(proportion  A^) ,  those  who  know  the  answer  is  A  or  C  and  guess  with 
probability  1/2,  1/2  between  them  (proportion  A^)  .  Finally  there  are 
those  with  no  knowledge  and  guess  with  equal  probability  among  A,  B,  c 
(proportion  A 4  =  1-*i”*2~^3^ ‘  If  A^  X^,  A^,  A4  were  known,  a  "good" 
way  to  grade  the  individual  students  would  be  easily  determined.  The 
data  only  supply  enough  information  to  estimate  two  of  the  three  unknown 
parameters.  The  multiple  choice  scorer  can  resolve  his  question  partially 
by  using  a  minimax  approach.  However,  for  the  Bayesian  to  recover  the 
entire  prior  distribution,  it  is  required  that  a  sufficient  body  of 
experimental  data  be  available  to  be  tapped. 

3  June  1966  Stanford  University 

Professor  Marschak's  comments  are  very  closely  related  to  the 
important  work  of  de  Finetti  (1937)  ,  revised  and  translated  in  Kyburg 
and  Smokier  (1964),  and  discussed  also  by  Savage  (1954),  Ch.  3,  Sec.  7. 
This  work,  which  gives  strong  support  to  the  Bayesian  theory  of  statis¬ 
tics,  shows  that  if  a  person's  probabilities  on  the  outcomes  of  a 
sequence  of  coin  tossings  satisfy  certain  conditions  of  symmetry,  or 
exchangeability,  then  his  probabilities  can  be  represented  by  the  con¬ 
ditional  distributions  given  a  fictional  "unknown  p"  of  the  coin, 
together  with  a  unique  "prior  distribution  of  p."  I  agree  with 
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Professor  Marschak  in  emphasizing  that,  in  general,  an  infinite  sequence 
of  tosses  is  required  to  learn  this  prior  distribution. 

28  February  1966  Carnegie  Institute  of  Technology 

and  Stanford  University 

Thank  you  for  sending  me  your  note  on  personal  probabilities  of 
probabilities.  I  found  it  very  ingenious  and  very  convincing,  and  about 
as  lucid  a  description  of  how  to  generate  subjective  probabilities  as  I 
have  yet  seen. 

V  ► 

My  only  slight  divergence  from  you  is  my  conviction  that  the  world 
about  which  we  make  decisions  consists  of  nothing  but  probability  dis¬ 
tributions.  If  I  bet  indifferently  on  red  and  black  at  roulette,  I  am 

4  r 

disclosing  my  belief  that  the  wheel  is  well-balanced,  whereas  my  neighbor 
who  bets  a  system  discloses  his  belief  that  the  wheel  obeys  a  nonstationary 
stochastic  process.  If  I  seem  to  bet  on  "rain  tomorrow"  I  am  really 
expressing  my  belief  that  today's  weather  system  is  such  that  probability 
of  rain  tomorrow  is  high.  In  this  sense  all  we  ever  reveal  is  our  per- 

r  sonal  probabilities  of  probabilities,  so  that  the  case  you  discuss  is 

*  f 

the  fundamental  case  that  people  have  been  discussing  (mostly  implicitly) 
all  along.  What  the  recorded  experiments  disclose  is  actually  your  r^ 

v  rather  them  p, . 

o  1 

No  objection  to  circulation  if  you  think  it  worthwhile.  My  personal 
probability  of  the  probability  of  that  event  is  low. 

$  19  May  1966  Harvard  University 
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WARD  EDWARDS 

33— ■— — 

I  have  read  with  much  interest  your  comments  on  DeGroot's  paper, 
and  have  a  rew  comments.  Please  feel  free  to  make  any  use  of  them  that 
may  be  convenient. 

I  am  in  no  serious  disagreement  with  your  conclusion,  or  with  the 
argvxnents  that  lead  you  to  it.  However,  I  feel  that  you  are  working 
unnecessarily  hard  by  defining  an  unnecessarily  restricted  universe  to 
be  interested  in.  You  have  chosen  to  talk  only  about  questions  in  which 
the  population  characteristic  of  interest  is  now  knowable.  Many  questions 
are  like  that,  at  least  in  practice.  But  many  others  are  not  at  all 
like  that,  and  the  situations  in  which  you  cam  conveniently  get  at  the 
population  parameter  offer  useful  conceptual  devices  for  thinking  atfjout 
those  in  which  you  cannot. 

The  problem  arises  first  in  your  second  paragraph,  where  you  talk 
of  prior  distributions  as  sets  of  distributions  over  sets  of  events. 

When  I  think  of  a  prior,  or  a  posterior,  distribution,  I  think  of  it  as 
my  opinion  about  some  well-defined  issue  about  which  I  am  uncertain. 

If,  for  example,  I  am  uncertain  about  whether  this  bookbag  contains  700 
red  and  300  blue  chips,  or  700  blue  amd  300  red,  my  uncertainty  should 
be  describable  by  a  number.  (It  is  almost  meaningless  to  call  this 
number  a  prior  or  a  posterior  distribution  or  opinion.  All  opinions 
are  posterior  to  some  information  amd  prior  to  other  information.  Prior 
distributions  occupy  no  special  status  that  I  know  of  in  Bayesian  thinking.) 

My  opinions  can  be  modified  by  meams  of  information  that  for  me 
bears  on  them.  Sometimes  that  information  bears  on  them  so  potently 


(in  my  opinion)  that  my  posterior  distribution  would  approach  1  (for 
discrete  hypotheses)  or  would  have  a  peak  higher  than  any  preassigned 
number  (for  continuous  hypotheses).  In  the  example,  such  an  item  of 
information  could  be  obtained  by  dumping  out  the  bag  and  counting  the 
chips.  Other  kinds  of  information,  such  as  might  be  obtained  by  sampling 
with  replacement,  are  less  convincing.  I  see  no  shaip  formal  lines 
differentiating  overwhelming  from  non-overwhelming  evidence.  Of  course, 
such  lines  can  easily  be  constructed,  and  it  makes  rather  little  differ¬ 
ence  what  choice  of  operational  definition  of  "overwhelming"  is  used, 
so  long  as  the  arbitrariness  of  that  definition  is  recognized.  For 
illustration,  I  shall  arbitrarily  define  evidence  overwhelming  for  a 
discrimination  between  two  hypotheses  as  evidence  sufficient  to  change 
prior  odds  of  1:1  into  posterior  odds  of  at  least  1,000,000:1  as  between 
that  pair  of  hypotheses.  For  me,  counting  the  chips  in  the  bookbag  easily 
meets  that  test  —  given  the  truth  of  the  model  of  the  data-generating 
process  that  I  am  tentatively  working  within. 

Opinions  characterize  me,  not  the  bookbag.  My  opinions  about  either 
the  proportion  of  red  chips  in  the  bookbag  or  the  probability  that  the 
next  chip  to  be  sampled  will  be  red  can  be  defined,  and  measured,  only 
by  observing  my  behavior.  (Discussion  of  what  behavior  to  observe  comes 
later  in  this  letter.)  I  see  absolutely  no  formal  difference  between  my 
opinions  about  the  population  parameter  and  about  the  identity  of  the 
next  sample.  There  is,  of  course,  an  important  practical  point:  I  have 
a  formal  model  that  implies  for  each  possible  bookbag  composition,  what 
the  probability  is  that  the  next  sample  will  be  red.  I  hold  that  model 
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as  a  working  hypothesis  with  high  probability.  Thus  some  internal  con¬ 
sistency  rules  link  opinions  about  samples  (data)  with  opinions  about 
bookbag  compositions  (hypotheses).  However,  these  internal  consistency 
rules  work  in  both  directions;  specification  of  P(d|h)  by  itself  speci¬ 
fies  neither  P(H)  nor  P(D),  but  to  some  extent  constrains  both,  given 
the  existence  of  some  data. 

Now  we  come  to  the  question  of  what  observations  you  might  make  in 
order  to  discover  my  opinions.  Such  observations  are  numerous.  By  far 
the  simplest  procedure,  of  course,  is  simply  to  ask  me.  For  some  reason 
you  object  to  this.  You  make  a  distinction  that  I  cannot  understand 
between  what  you  call  words  and  what  you  cadi  actions.  Why  aren't  words 
actions?  You  seem  to  feel  that  if  I  discover  that  a  subject  is  indifferent 
between  betting  on  A  and  betting  on  A,  then  I  am  justified  in  saying  that 
A  and  A  are  equally  likely,  for  him.  Yet  you  deny  him  the  privilege  of 
making,  or  at  any  rate  of  communicating  to  you,  the  same  inference. 

I  feel  that  many  different  actions,  some  performed  with  tongue  and 
some  not,  some  with  immediate  consequences  and  some  not,  are  suitable 
for  indicating  what  my,  or  anyone  else's  opinions  are.  If  the  person 
being  studied  is  an  ideally  consistent  man,  then  those  opinions  will  obey 
all  appropriate  consistency  rules,  including  the  ones  that  permit  speci¬ 
fication  of  coherence  between  words  and  betting  behavior.  If  he  is  a 
real  man,  he  will  of  course  be  inconsistent.  It  is  an  empirical  question, 
not  to  be  answered  from  the  armchair,  whether  that  subset  of  his  behaviors 
defined  by  linguistic  responses  having  no  immediate  consequences  will  or 
will  not  be  consistent  with  other  subsets  of  his  behavior,  such  as  are 
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studied  in  so-called  choice  experiments.  I  know  of  absolutely  no  empiri¬ 
cal  evidence  indicating  that  such  verbal  responses  are  less  consistent 
with  choices  than  choices  are  with  one  another;  in  fact,  if  anything  I 
think  the  evidence  is  in  the  other  direction. 
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The  set  of  responses  you  have  proposed  is,  of  course,  in  principle 
a  usable  set.  Of  course,  it  doesn't  really  meet  your  objection  to  the 
interpretation  in  terms  of  bets  without  the  auxiliary  Borel  dial.  You 
require,  as  a  first  step,  establishment  of  indifference  between  equal 
arcs.  Indifference  is  not  observable  in  human  behavior;  it  must  be 
inferred  from  easily  switchable  preferences,  as  in  the  Davidson,  Suppes, 
and  Siegel  experiment.  But  given  that  kind  of  procedure  and  a  consistent 
subject,  nothing  more  is  required  to  obtain  utility-of-money  functions. 
And  given  those  functions,  the  definition  of  odds  in  terms  of  acceptable 
bets  works  fine,  using  utiles  instead  of  dollars  in  the  relevant  ratios. 
In  other  words,  I  see  no  reason  for  constant  reference  to  the  Borel  dial. 
Instead,  the  indifference-defining  operation  can  be  used  to  measure  the 
utility  of  money,  and  thereafter  utilities  instead  of  dollars  can  be  used 
in  a  bet-based  definition  of  probability.  Or,  instead,  you  could  simply 
ask  consistent  people  to  estimate  numbers...  Of  course  any  of  these  pro¬ 
cedures,  applied  to  real  men,  will  produce  inconsistencies.  Any  single 
procedure  will  produce  internal  inconsistencies;  any  pair  of  procedures 
will  produce  inconsistencies  between  the  results  of  the  two  procedures. 
From  this  point  of  view,  I  hold  no  one  procedure  inherently  more  valid 
than  any  other.  Your  procedure  of  using  sample  urns,  observing  prior 
and  posterior  distributions,  and  inferring  what  the  "first"  prior  must 
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have  been,  ia  feasible.  But  why  be  so  roundabout?  Why  deny  yourself 
the  freedom  to  dump  out  those  urns  and  count  the  balls  in  them?  Then 
you  can  indeed  observe  what  the  true  event  is,  and  so  can  settle  bets. 

I  see  no  difference  between  that  procedure  and  the  sampling  procedure 
you  propose,  except  that  the  sampling  procedure  is  a  lot  more  complex. 
But,  of  course,  I  see  no  difference  of  formal  or  philosophical  status 
between  the  events  that  I  have  been  calling  data  and  those  that  I  have 
been  calling  hypotheses.  And  that  brings  us  back  to  where  we  started. 

I  know  nothing  about  Professor  Graves's  question. 

18  May  1966  University  of  Michigan 

TjJ^raRGUSON 

Here  are  the  comments  you  requested  on  your  remarks  on  Probabilities 
of  Distributions. 

1.  Let  X^,X  be  independent  Bernoulli  trials  with  probability  0 

of  success,  and  let  x ( 0)  be  a  prior  distribution.  The  mathematical 

problem  may  be  stated:  given  sufficiently  accurate  information  on  the 

marginal  distribution  of  X,  ,X„ ,...,X  ,  can  one  determine  t(0)  suffi- 

ciently  accurately?  For  fixed  n,  obviously  not:  for  n  *  1,  all  that 

can  be  determined  accurately  is  P(X  =  1)  =  /0dx  (0)  ,  namely,  the  mean  of 

t.  However,  from  the  marginal  distribution  of  X.  ,...,X  ,  one  can  deter- 

l  n 

mine  the  first  n  moments  of  t,  so  that  for  sufficiently  large  n  and 
sufficiently  accurate  information  on  the  first  n  moments,  x  may  be 
determined  as  accurately  as  desired,  because  it,  being  a  bounded  dis¬ 
tribution,  is  determined  by  its  moments. 


* 


2.  A  procedure  for  estimating  t  is  as  follows.  Let  X  - 


Then, 


Hence 


I  En  X  . 
n  1  i 


p(xn  1 1)  =  rQ  P0(xn  <,  t)  di  (0) 


0  if  t  <  0 


P  (Xn  <t)  +1/2  if  t  -  0  . 


1  if  t  >  0 


P  (X  <  t)  ■»  T-"t . )  %  T(t  >  d|f  T(t) 

n  —  2 


so  that  if  n.  is  chosen  so  large  that  iPtX^  £  t)  -  x(t)  |  <  e/2,  and 

P (X  <  t)  is  determined  to  within  e/2,  then  i(t)  is  determined  to  within 
n  — 

e,  as  the  quantity  P  (X^  _<  t) . 

3.  This  problem  is  related  to  the  problem  of  identifying  and  esti¬ 
mating  a  mixing  distribution.  See  for  example  Teicher  [Ann.  Math.  Stat. , 
1963,  pp.  1265-1269].  The  general  problem  is:  given  F^(x),  to  estimate 
x(0)  from  a  sample  from  the  distribution  H(x)  «  /Fg(x)dx(0).  This  corres¬ 
ponds  to  the  above  problem  for  n  =  1.  The  distribution  x  can  be  so 
estimated  for  many  kernels,  Fg(x).  See  Gaffey  [Ann.  Math.  Stat. ,  1959, 

pp.  198-205]  for  the  case  where  F^(x)  is  normal  with  mean  0  and  variance  1. 
A  student  of  mine,  Carl  Maltz,  has  found  corresponding  methods  for  other 
families  of  distributions  —  to  appear  in  his  Ph.D.  thesis. 

4.  The  distribution  with  density 


f  (x  9) 


-(1  +  0x)  for  -1  <  x  <  1 
0  otherwise 


P 
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where  -1  <  8  <  1,  like  the  Bernoulli,  does  not  lend  itself  to  estimating 
the  mixing  distribution,  even  though  there  are  an  infinity  of  values, 
h(x)  ■  / f ( x 1 0 ) dx < 0 )  to  be  used.  This  is  because  h(x)  ■  ^-(1  +  x/6dT(8)), 
so  again  only  the  mean  of  t  may  be  determined.  This  corresponds  to 
n  ■  1. 

If  n  is  allowed  to  be  arbitrary,  then  again  t  may  be  estimated. 

One  procedure  for  accomplishing  this  is  to  transform  to  the  Bernoulli 

case,  calling  X  >.  0  a  success,  and  X  <  0  a  failure.  Then  since  the  dis- 

1  0 

tribution  of  the  probability  of  success,  —  +  — ,  may  be  approximated,  so 
may  the  distribution  of  0. 

This  procedure  extends  to  all  cases  where  6  is  identifiable  from 
FQ(x)  for  x  in  some  denumerable  set,  D  (i.e.,  when  0  can  be  found  knowing 
the  numbers  (x)  for  x  in  D) .  For  example,  if  the  sample  space  is 
Euclidean,  and  if  0  is  identifiable,  then  x  may  be  approximated. 


15  June  1966 


University  of  California, 
Los  Angeles 


KOICHI  MI Y AS  AW  A 


If  I  am  not  misunderstanding  Professor  Marschak's  note,  his  issue 
comes  from  the  following  postulate:  in  order  that  the  personal  prob¬ 
ability  of  an  event  Z  has  an  operational  meaning,  the  event  should  be  a 
real  one.  Here  by  a  real  event  I  mean  one  about  which  the  person  can 
know  it  obtains  or  not  after  all. 

If  we  admit  non-real  events  in  determining  their  personal  probabili¬ 
ties  by  the  choice  behavior  of  the  person  among  bets  on  these  events, 
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then  it  seems  to  me  that  there  is  no  need  to  raise  a  question  on  the 
operational  meaning  of  personal  probabilities  of  probabilities. 

I  would  like  to  admit  that  the  reasonable  person  Mr.  Smith  can  con¬ 
template  an  imaginary  or  non-real  bet  which  is  assumed  to  give  a  prize 
to  him  if  the  true  ratio  p  of  the  red  balls  in  the  urn  lies  between,  say, 
10%  and  30%  even  though  he  might  never  know  the  true  value  of  p  and  can 
show  his  choice  between  the  bet  and  a  bet  which  is  defined  by  means  of 
a  clock's  twirled  hand.  If  we  do  not  admit  such  bets  on  non-real  events 
in  operational  definition  of  personal  probability,  I  do  not  think  the 
personal  probability,  with  respect  to  Mr.  Smith  who  is  going  to  board  the 
plane,  of  a  crash  of  the  plane  can  have  an  operational  meaning,  since 
Mr.  Smith  cannot  realize  the  event  Z  that  the  plane  will  crash  obtains 
or  not  until  he  s  afely  arrives  at  the  destination  by  the  plane  or  dies 
by  a  crash  of  the  plane. 

21  May  1966  University  of  Tokyo 

...  Although  I  have  no  answers  to  the  questions  posed  by  Marschak,  I  do 
have  a  few  comments  to  make  and  a  few  additional  questions  to  pose.  In 
his  comments  Marschak  suggests  that  the  prior  distribution  function  be 
denoted  by  F.  It  has  been  suggested  by  many  (e.g.,  see  Good,  1965)  that 
F  be  a  beta  form,  proportional  to  p  x(l-p)  where  m^  >  -1  and  m2  >  -1. 
If  this  assumption  be  made,  can  the  two  parameters,  and  m^,  of  F  be 
determined  from  r^+^  using  equation  3'  of  Marschak  or  from  equation  4'? 

I  think  not.  For  example,  suppose  t  =  0.  Then  since 
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1  ml  m2 

/Jp  (1  *P)  dP 


v  2' 


+  m2  +  1) .' 


we  have  the  probability  that  the  first  ball  is  red  as 


m^  +  1 


1  mi  +  m2  2 


? 

I' 
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and, of  course,  the  probability  of  not  red  is 

_  m^  +  1 

r  -  - - - ■ — -  . 

1  mi  +  m2  +  ^ 

Can  one  specify  r^  and  r^  and  from  this  determine  and  m2?  The  answer 
is  obviously  no,  and  thus  the  prior  distribution  function  F  is  still 
unknown. 

Of  course,  if  a  red  is  observed  on  the  first  sample,  the  probability 
of  a  red  the  second  time  is  now 

m^  +  2 

r2(1)  “  m  +  m  +  3 
1  2 

and  the  probability  of  a  not  red  is 

_  m2  +  1 

c2(1>  *  mx  +  *  3  ' 


* 


If  now,  the  experimenter  had  been  required  to  state  his  personal  prob- 
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If  we  examine 

m  +  y  +  1 

t+l'V  m  +  t  +  2 

m,  +  Ct-y  )  +  1 
t+l'Yt'  m  +  t  +  2 

where  m  =  m^  +  m^ ,  we  see  that  m^  and  play  a  role  similar  to  that  of 
yfc  and  t-y respectively.  Can  we  conclude  than  that  m^  and  m  represent 
the  number  of  reds  and  non-reds  that  the  experimenter  thinks  he  would 
have  seen  from  a  sample  of  size  m?  Or,  since 

+  1 

rl  "  m  +  2  ' 

if  the  experimenter  is  willing  to  specify  his  personal  probability,  r^ 
can  we  also  let  him  specify  m  as  his  degree  of  confidence  that  this  is 
the  correct  r?  If  so,  would  m  -  0  imply  no  confidence?  For  m  «  0,  m^ 
and  n>2  can  vary  from  -1  to  +1.  It  is  a  bit  awkward  to  try  to  interpret 
a  value  such  as  m^  ■  .5  to  be  the  number  of  red  balls  one  can  expect  in 
0  draws.  Also,  what  is  the  interpretation  of  negative  m? 

If  this  is  difficult,  suppose  the  experimenter  is  unwilling  or  unable 
to  guess  r  but  is  willing  to  try  to  specify  a  prior  distribution  for  r, 
say  $(r) .  If  this  is  done,  will  this  be  enough  to  specify  F? 

Do  you  suppose  the  following  is  fair:  Suppose  the  experimenter  has 
no  knowledge  and  no  confidence.  Thus,  he  may  take  r^  =  1/2  and  m  =  0. 

This  implies  that  m^  »  0  and  m^  ■  0  and,  therefore,  the  prior  density  is 
the  uniform,  that  is 


f(p)  - 


2  p  €  [0,1] 


0  otherwise  . 


Suppose  the  first  observation  was  red.  Then  we  would  expect 
r2(l)  -  2/3  . 

But,  suppose  that  after  seeing  the  first  observation  experimenter  says 
"oops,  1  think  I  goofed.  I  think  r^  should  be  something  else,  such  as 
r1  =  .60.  Furthermore  since  I  have  some  information  from  one  observation 
I  will  take  m  «  1."  If  he  is  allowed  to  do  this,  then  we  would  allow  him 
to  change  and  n»2  to 


■  .8 


m2  “  .2 


this  makes 


(.8)1 (.2)1 


.8.,  ..2 

p  (1  -  p) 


which  in  turn  gives 


/ 1  \  2 « 8  M 

r2(l)  “  —  =  .7 


Continuing,  after  observing  the  second  observation  should  we  allow  him 

to  change  his  mind  again  regarding  r^7 

A  similarly  interesting  set  of  questions  arises  when  urns  do  not 

contain  balls  but  disks  with  numbers,  such  that  p  =  (P]/P2'  •  •  •  'Pj^ 

the  vector  of  probabilities  p,  that  a  disk  drawn  at  random  is  equal  to 

3 

j,  j  *  1,2,...  ,k.  It  is  evident  that  p  is  an  element  of  the  simplex 


S  -  {p  :  p  >  0,  1  p  -  1}  . 

3  -1-1  J 
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Then,  suppose  n  observations  are  made,  x, ,x.,...,x  ,  of  which  n.  are 

i  2  n  j 

equal  to  jj  that  is,  n^  is  the  cardinality  of  the  set  {x^:x^  ■  j}. 
Then,  the  probability  that  the  (n+1)^  ball  is  ^  is 


Wj) 


.  *1  n2  nj+1  "k  ,  , 

/s  Pj  P2  ♦  ».  Pj  . ..  Pk  dF  Ip) 

r  nl  n2  "j  nk  .  . 

/ g  Pi  P2  ...  p.  J  . ..  pk  dF (p) 


Suppose  as  before  we  let  f(p)  be  proportional  to 


ml  m2 

Pi  P2 


where  >  -1  for  all  j.  Then  by  the  Dirichlet  integral  (Dirichlet, 
Comp.  Rend.  Acad.  Sci. ,  1839)  we  have 


rntl<J) 


m.  +  n,  +  1 


k  +  m  +  n  ' 


j  ■  1,2, ... ,k 


where  m  ■  I  m  .  Then, 

j-1  3 


r0(j} 


in,  +  1 


k  +  m 


We  could  give  values  to  rQ(j),  j  *  l,2,...,k,  but  as  before,  this  would 
not  give  F(p).  If  we  could  specify  the  personal  probabilities  rQ(j)  and 
also  the  confidence  value  m,  then  of  course  we  know  F. 

Suppose  our  personal  feelings  are  only  that  ^(j)  can  be  approxi¬ 
mated  quite  nicely  by  a  normal  distribution,  that  is, 


j+1/2  x 

r«(j’ =  ^ 


-  (x  -  u ) ' 
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Indeed,  this  is  suggested  by  J.  Heller  (I960)  for  the  scheduling  problem. 

2 

If  p  and  o  are  not  known,  can  we  take  n  observations,  find  the  sample 

2 

mean  x  and  variance  s  as  the  maximum  likelihood  estimates  of  u  and  a  , 
n  n 

respectively,  and  use  these  for  computing  good  values  of  r^(j)?  Perhaps, 
this  is  feasible  when  one  wishes  to  use  these  prior  distributions  for 
tests  of  hypotheses,  such  as  stopping  rules. 

On  the  other  hand,  would  we  want  to  digress  one  step  and  require 
the  experimenter  to  specify  a  prior  distribution  for  p  and  o?  What  kind 
of  an  operational  meaning  could  we  ascribe  to  these  probabilities?  I  do 
not  know. 

One  last  remark:  I  have  developed  stopping  rules  for  the  multinomial. 
For  the  case  of  complete  ignorance  (r^  =  1/k,  j  =  l,...,k)  and  no  confi¬ 
dence  (m  *  0)  I  found  that  the  convergence  was  exasperatingly  slow.  I 
certainly  carnot  show  that  it  is  right.  I  am  merely  using  the  age  old 
engineering  motto  "if  it  works,  it  is  right."  So  far,  the  results  are 
very  encouraging. 


26  September  1966 


New  Mexico  State  University 


LEONARD  J.  SAVAGE 

Almost  all  that  you  inquire  about  has  long  been  well  studied  under 
the  rubric  of  exchangeable  processes.  A  now  obsolete  term  is  sequences 
of  equivalent  events.  Not  the  earliest,  but  one  of  the  most  thorough 
and  important  references,  is  de  Finetti's  masterpiece,  "La  Prevision"  (1937), 
in  the  Institut  Henri  Poincare,  which  was  translated  and  brought  up  to 
date  under  de  Finetti's  supervision  in  the  anthology  edited  by  Kyburg  and 
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Smokier  (1964).  Section  3.7  of  my  book.  The  Foundations  of  Statistics  (1954) 
is  devoted  to  the  same  topic.  Though  these  references  should  put  you 
in  almost  complete  possession  of  the  facts,  and  though  others  to  whom 
you  circulated  your  manuscript,  "Do  personal  probabilities  of  probabili¬ 
ties  have  an  operational  meaning,"  may  already  have  covered  the  ground 
pretty  well,  I  shall  make  some  of  the  obvious  applications  of  the  theory 
of  exchangeable  sequences  of  events  to  various  passages  of  your  manuscript. 

1.  With  reference  to  your  Item  2,  many  Bayesian  statisticians, 
especially  those  directly  influenced  by  de  Finetti,  recognize  clearly 
that  references  to  unknown  6  may,  depending  on  the  context,  be  merely 

a  figurative  way  of  describing  a  sequence  of  dependent  random  variables, 
or  something  of  the  sort. 

If,  for  example,  I  say  that  the  are  normally  distributed  with 

unit  variance  around  6  and  that  6  is  for  me  normally  distributed  around 

y  with  standard  deviation  o,  I  may  mean  only  that  the  x's  are  distributed 

as  they  would  be  were  there  an  actual  physical  constant  8  about  which  my 

opinion  was  as  described,  and  with  knowledge  of  which,  the  x^'s  would  for 

me  have  independent  normal  unit  distributions  about  6.  The  actual  upshot 

of  this  is  that  the  x^'s  for  me  are  variables  with  a  joint  normal  dis- 

2 

tribution  such  that  each  has  mean  y  and  variance  1  +  o  ,  and  such  that 

2 

the  covariance  between  pairs  of  the  x^'s  is  a  . 

2.  Your  Item  3.  Some  of  us  Bayesians  believe  that  there  is  no 
intellectual  need  to,  or  possibility  of,  introducing  any  other  kind  of 
probability  than  personal  probability.  That  is  not  a  thesis  to  be  argued 
here;  for  the  moment  I  want  only  to  point  out  that  the  thesis  cannot 
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even  be  heard,  let  alone  judged,  without  an  understanding  of  exchange¬ 
able  events. 

To  illustrate,  for  us,  the  situation  alluded  to  in  your  Item  6  is 
this.  In  certain  cases,  a  person  will  judge  a  sequence  of  events  to  be, 
for  him,  both  exchangeable  and  independent  with  a  specified  probability. 
These  assumptions  together  with  the  rules  of  logic  imply  all  the  prob¬ 
abilities  associated  with  this  sequence  of  events.  These  probabilities 
are  "objective"  in  the  trivial  sense  that  any  person  who  shares  the 
original  person's  underlying  opinions  will  share  with  him  the  implica¬ 
tions  of  those  opinions  —  that  can  be  said  about  any  system  of  opinions. 

3.  Your  Item  7.  Historical  questions  can  be  delicate  and  dangerous. 
I  have  read  Bayes'  paper  but  discerned  no  evidence  that  Bayes  regarded 
probability  as  personal,  or  subjective.  It  would  be  worth  some  trouble 
to  document  the  point  one  way  or  the  other. 

4.  Your  Item  9.  The  indirect  method  initiated  here  is  a  little 

more  complicated  and  confusing  than  need  be.  It  would  be  enough  to  ask 

the  person  once  and  for  all  for  his  personal  probability  that  all  k  of 

the  first  k  balls  drawn  will  be  red,  for  each  k  from  1  through  n-1. 

Calling  these  numbers  r(k) ,  the  numbers  f^  are  related  to  the  known 

numbers  r (k) ,  p  , . . . ,p  by  the  relations 
l  n 

n  k 

(2')  r(k)  -  I  f.pf  ,  k  ■  0, . . .  ,n-l 

i-1  1  1 

where  it  is  to  be  understood  that  the  heretofore  undefined  r(0)  is  1. 

If  none  of  the  p^  are  equal  to  each  other,  this  system  of  n  linear 
equations  and  unknowns  had  a  unique  solution,  according  to  the  theory 
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of  the  van  der  Monde  determinant.  Just  when  it  has  a  nonnegative  solu¬ 
tion  ,  I  do  not  know  but  could  perhaps  look  up. 

A  deeper  analysis,  more  in  the  spirit  of  your  (3‘)  is  that  the 
infinite  sequence  of  r(k)  constitute  the  moments  of  F,  which  according 
to  Hausdorff  are  sufficient  to  characterize  F. 

Such  quantities  as  those  in  the  numerator  and  in  the  denominator 
of  (3')  are  obviously  inferable  from  moments,  and  of  course  have  the 
moments  as  special  cases.  Not  just  any  sequence  of  numbers  r(k)  can 
be  the  moments  of  a  distribution;  your  (1)  corresponds  to  the  monotony 
of  F,  which  must  be  respected.  It  has  been  known  at  least  since  Hausdorff 
that  the  necessary  and  sufficient  condition  for  this  is  that  the  quanti¬ 
ties  that  appear  in  (3')  all  be  nonnegative.  This  is  commonly  expressed 
in  two  ways  as  follows: 

0  <  E(pa(l  -  p)b)  -  E(E  (b)  (-l)Cpa+C) 

—  c  c 

-  E  (-l)C(b)  r(a+c)  =  Q(a,b)  . 
c  c 

But  Q(a,b)  is  the  probability  of  a  reds  followed  by  b  blacks.  Whence 
Q(a,b)  *  Q(a+l,b)  +  2(a,b+l) .  Therefore, 

Q(a  +  l,b)  ■  Q(a,b)  -  2(a,b+l)  =  -A^Q(a,b) 

Q(0,b)  -  _-(b)  >_  0 

0(1, b)  -  -^rtb)  >_  0 

0(2,b)  -  Aj*r(b)  >  0 

•  •  •  • 

Q(a,b)  -  (-l)ad£r(b)  >  0  . 


5.  End  of  your  Item  9.  As  has  already  been  implied,  the  entire 
sequence  of  r(k)  is  sufficient  to  determine  F.  Your  r^  are  a  random 
sequence,  but  the  first  T  elements  of  it  do  always  exactly  determine 
the  first  T  moments  r(k).  Therefore  the  entire  sequence  of  r ^ ,  exactly 
like  the  r(k) ,  do  determine  the  F,  and  it  is  actually  true  that  less 
and  less  latitude  for  F  is  available  as  T  increases  (see  Shohat  and 
Tamarkin,  The  Problem  of  Moments,  p.  77  ff.,  Amer.  Math.  Society,  New 
York,  1950).  Occasionally,  a  finite  number  of  r(k),  and  therefore  of 

2 

r^,  is  sufficient  to  determine  F  exactly.  If,  for  example,  r(2)  ■  r(l)  , 
then  necessarily  F  is  entirely  concentrated  at  the  point  r(l) ,  so  r(k)  ■ 

Jc 

r(l)  for  all  k.  Again,  if  the  person  feels  certain  that  the  balls  are 
either  all  balck  or  all  red,  that  will  be  promptly  revealed  by  the  con¬ 
dition  that  r(l)  ■  r(2).  I  think  of  a  few  other  exceptional  cases  by 
combining  the  ones  already  mentioned.  The  general  situation  can  probably 
be  dug  out  of  the  book  by  Shohat  and  Tamarkin. 

6.  Your  Items  10  and  11.  The  probability  that  a  specific  number 

yfc  of  balls  will  be  red  in  the  first  t  drawings  is,  except  for  an  uninter¬ 
esting  binomial-coefficient  factor,  the  same  as  the  probability  that  a 
specific  subset  of  balls  with  y  numbers  will  consititute  exactly  the 
red  ones.  These  latter  numbers,  my  Q(a,b),  are  for  some  purposes  easier 
to  deal  with,  because  the  binomial  coefficient  is  left  out.  The  question 
raised  in  your  Item  11  is  now  seen  to  be  the  same  as  the  one  discussed 
in  my  preceding  point. 

7.  It  is  important  to  realize  that  the  question  with  which  your 

note  ends  is  to  be  answered  in  the  negative.  No  new  postulates  are  needed. 
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If  a  person  associates  probabilities  with  a  sequence  of  events  x^,  and 
if  the  events  are  exchangeable,  that  is  if  the  probability  for  the  person 
of  any  finite  pattern  of  successes  and  failures  depends  only  on  the 
numbers  of  successes  and  failures  involved  and  not  on  their  order,  then 
(and  only  then)  will  there  be  an  underlying  F.  The  entire  system  of 
inequalities  needed  is  only  that  the  Q(a,b)  be  nonnegative,  but  Q(a,b) 
is  simply  the  probability  for  the  person  of  a  specific  sequence  involving 
a  successes  and  b  failures. 

15  June  1966  Yale  University 

RDBEMTSCHLAIFER 

'If  we  are  talking  about  subjective  probabilities  as  a  part  of  a 
methodology  for  thinking  through  to  decisions  and  not  as  part  of  a  model 
for  predicting  decisions,  then  I  am  afraid  that  my  reaction  to  your  dis¬ 
cussion  of  Mr.  de  Groot's  talk  is  that  I  just  cannot  see  or  feel  that 
there  is  any  real  problem  to  be  discussed. 

First,  it  seems  to  me  clear  that  many  probabilities  for  observable 
events  cannot  be  verified  by  actually  observing  betting  behavior.  If  I 
think  that  promotional  campaign  A  gives  a  1/2  chance  of  "success"  while 
promotional  campaign  B  gives  a  3/4  chance  of  success,  and  if  after  taking 
costs  into  account  I  analyze  my  decision  problem  and  decide  to  use  strategy 
A,  you  cannot  observationally  verify  the  subjective  probability  3/4  that 
I  assigned  to  success  with  promotional  campaign  B.  But  when  I  am  analyz¬ 
ing  my  decision  problem  I  feel  that  both  my  probabilities  have  exactly 
the  same  kind  of  meaning  to  me;  and  since  I  am  making  the  decision,  I 
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don't  really  care  at  all  whether  some  observer  will  or  will  not  be  able 
to  "verify"  one,  both,  or  neither  of  my  probabilities  after  my  decision 
has  been  made. 

As  regards  distributions  of  unknowable  parameters  of  processes  such 
as  the  Bernoulli,  I  am  again  unable  to  feel  any  real  problem  for  a  deci¬ 
sion  maker  as  opposed  to  a  decision  observer.  I  can  imagine  betting  on 
the  event  r  successes  in  n  trials  just  as  well  when  1  know  that  the  n 
trials  will  not  really  be  made  as  I  cam  when  I  know  that  they  will  really 
be  made  —  the  only  thing  that  counts  is  the  fact  that  they  have  not  yet 
been  made  —  the  meaming  to  me  of  my  probability  for  r  heads  in  n  trials 
does  not  depend  at  all  on  whether  the  trials  will  actually  be  made.  I'm 
not  quite  so  sure  that  I  know  what  I  mean  by  a  fraction  p  of  heauls  in  am 
infinite  number  of  trials,  but  I  can  always  think  about  the  fraction  p' 
in  a  googolplex  of  trials  and  then  argue  purely  mathematically  that  I 
will  make  no  error  of  practical  interest  if  I  assign  to  the  parameter  p 
the  saune  distribution  that  I  assessed  for  p'. 

You  aure  more  tham  welcome  to  make  amy  use  you  like  of  these  remarks, 
for  the  triviality  of  which  I  apologize. 

27  June  1966  Harvard  University 

ROBERT  L.  WINKLER 

■  "ami*  ■ 

Please  excuse  my  delay  in  responding  to  your  "Round  Robin."  It 
seems  to  me  that  the  verbal  approach  need  not  be  rejected,  although  it 
apparently  has  been  rejected  in  the  development  of  the  personalistic 
theory  (de  Finetti,  1937;  Savage,  1954).  The  criticism  of  the  verbal 
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approach,  as  I  see  it,  centers  around  the  claim  that  we  have  no  way  of 
knowing  if  the  verbal  answers  are  in  accordance  with  the  assessor's 
beliefs  and  judgments  —  indeed,  the  assessor  does  not  necessarily  have 
any  incentive  to  make  his  answers  correspond  with  his  beliefs  and  judgments. 

In  light  of  recent  work  by  de  Pinetti  regarding  scoring  rules,  or 
penalty  functions,  it  appears  that  personal  probabilities  can  be  given 
an  operational  meaning  in  terns  of  verbal  answers.  If  a  reward  (or 
punishment)  is  determined  from  the  assessor's  answers  according  to  some 
scoring  rule  which  is  so  constructed  as  to  oblige  the  assessor  to  make 
his  answers  be  in  accordance  with  his  beliefs  and  judgments,  then  the 
element  of  incentive  is  present  and  the  criticism  removed. 

Unfortunately,  with  the  penalty  functions,  as  with  the  betting  rules, 
knowledge  of  an  "actual  value"  is  necessary  to  implement  the  methods. 

The  "actual  value"  is  needed  to  determine  the  winner  of  the  bet  or  to 
determine  the  score  obtained  (and  the  resulting  reward  or  punishment) 
through  the  penalty  functions.  Unless  other  penalty  functions  (e.g. , 
dependent  upon  sample  results)  can  be  developed,  then,  we  are  still  faced 
with  the  problems  discussed  in  your  note. 

A  final  note  is  that  the  problem  of  "actual  values"  seems  to  have 
been  ignored  in  the  modification  presented  in  your  Item  10.  If  no  actual 
drawings  are  to  take  place,  what  incentive  does  the  assessor  have  to  make 
careful  assessments?  And  if  this  is  so,  why  is  this  any  improvement  over 
the  verbal  approach?  Of  course,  the  assessor  might  take  the  matter 
seriously  for  a  time;  but  eventually  he  would  find  that  no  drawings  were 
to  be  made,  and  he  might  then  lose  interest.  In  this  case  (the 
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modification),  it  seems  that  the  threat  of  actual  drawings  is  used  to 
provide  incentive.  Otherwise,  this  case  could  be  extended  to  the  case 
where  no  actual  drawings  are  possible. 

These  comments  have  been  scattered  and  not  directly  related  to  the 
actual  questions  posed  in  the  note.  Nevertheless,  if  you  think  they  are 
of  any  value,  feel  free  to  circulate  them  in  the  "Round  Robin."  Since 
I  am  most  interested  in  the  questions  posed  (and  left  unanswered  by  me) 

I  look  forward  to  reading  the  comment?  of  other  participants. 


10  August  1966 


Indiana  University,  Bloomington 


,  Wv  fiiMair  tAitiUdi^a 
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NOTES 

1.  On  the  Colloquium,  now  14  years  in  existence,  see  J.  Marschak  (1972)  . 

2.  I  follow  here,  in  essence,  a  lecture  delivered  at  Columbia  University 
in  1950  and  published  in  1954.  See  Bibliography. 

3.  This  is  meant  to  apply  to  "stochastic"  theories  of  choice  —  see  also 
p.  99  of  Block  and  Marschak  (1960);  but  it  applies  also  to  non¬ 
stochastic  models. 

4.  Only  finite  sets  of  events  have  been  considered  here  and,  for  example, 
in  J.  Marschak  (1968),  p.  49;  (1970),  Section  6;  (1973);  and  J.  Marschak 
and  R.  Radnor  (1972),  Chapter  II,  Sections  8-11. 

5.  The  procedure  has  been  actually  applied  by  Stael  von  Holstein  (1970). 

See  also  Savage  (1971) . 
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